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Law enforcement agencies face a widespread problem of corruption, which 
jeopardizes their credibility and institutional integrity. Thus, the primary goal 
of this study is to develop a machine learning prediction model for petty 
corruption intentions as an early warning system for law enforcement officials 
who fail to perform their duties and obligations with integrity. Using a 
questionnaire survey of two hundred twenty-five participants, from senior 
officers to rank and file police officers, this study presents the fundamental 
knowledge on the design and implementation of machine learning model 
based on six selected algorithms; generalized linear model, fast last margin, 
decision tree, random forest, gradient boosted trees, and support vector 
machine. In addition to demographic factors, the efficacy of each machine 
learning algorithm on petty corruption was evaluated using general strain 
theory (GST) attributes: financial stress, work stress, leadership pressure, and 
peer pressure. The findings indicated that peer pressure has given the highest 
weight of contributions to most of the machine learning algorithms. The most 
outperformed machine learning in terms of the classification accuracy is 
gradient boosted trees with accuracy above 90%. This paper presents useful 
knowledge to enhance the realization of implementing intelligent corruption 
detection tools. 
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1. INTRODUCTION 


Corruption is a major issue in many law enforcement agencies around the world [1], [2]. This 
institutional disease halts the application of the rule of law and denies equitable access, putting the country’s 
security, justice, and economic progress at risk. Corruption, generally defined as abuse of authority by civil 
servants for private gain and interests [3]. According to previous research [4], [5], there are two categories of 
corruption: grand corruption and petty corruption. Grand corruption is frequently connected with high-ranking 
public officials who may be harmful to a long-term functioning economy. For example, high-level influence 
over the awarding of government contracts for the construction of new schools may result in unqualified 
companies controlled by high-level public official cronies receiving these contracts. Petty corruption, often 
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known as bureaucratic corruption, involves low-level public officials who cause public benefit to be distorted. 
Examples of petty corruption include bribery, embezzlement, and favoritism. 

Similar to other law enforcement authorities, corruption is one of the key difficulties that police 
institutions face [6], [7] and it undermines the legitimacy of the agency. As an enforcer of the rule of law of 
the country, corruption is often entwined with the police as criminal networks make extensive use of the 
institution to carry out criminal activity, avoid investigation and escape prosecution [8]. According to Kleinig 
[9], police officers commit corruption when they exercise or neglect to exercise their authority with the primary 
goal of advancing personal or departmental gain. Making false reports and perjury, shielding illicit gaming, 
theft of drugs on the street, theft of confiscated property, receiving discounts on purchases, and selling 
information about police operations are the most typical forms of corruption among police officials [10]. 
Furthermore, according to the Transparency International Global Corruption Barometer [8], police officers are 
the most frequently reported recipients of bribes. Transparency International’s Advocacy and Legal Advice 
Centre received over 1,500 reports of police and military abuse during the COVID-19 outbreak [11]. Levi [11] 
adds that the bribes required by police and soldiers from civilians who pass through checkpoints, stay out past 
curfew, and want to leave the quarantine center were the source of the corruption reports. 

Efforts against corruption both petty and grand, have been implemented by most governments around 
the world. In Malaysia, the government has launched the National Anti-Corruption Plan (NACP) 2019-2023, 
with the goal of making the country free of corruption by 2023. The NACP was developed with practical goals 
based on the initiatives done by government and private agencies to address issues around corruption, integrity, 
and governance for the coming five years. 

Despite the government’s effort to fight corruption in the country, Malaysia was ranked 61“ out of 
180 countries in the 2021 corruption perception index (CPI), with a score of 52 out of 100. The CPI index 
measures the public’s opinion of official corruption in a given country. The lower the CPI score, the greater 
the public view of the chance of Malaysian public officials being involved in corruption. Furthermore, 
according to the Enforcement Agency Integrity Commission’s (EAIC) annual report 2021 [12], the Royal 
Malaysia Police (RMP) receives the most public complaints of misconduct including corruption among 
Malaysia’s law enforcement authorities. Since the EAIC commenced in 2004, more than 75% of the 
institution’s total misconduct complaints every year involved RMP officers. 

Given that, there is an urgent need to identify more effective measures to combat police corruption. 
There have been very few studies on police corruption in Malaysia so far. Hence, this study aims to add to the 
existing body of knowledge by investigating the use of machine learning classification algorithms for 
predicting RMP petty corruption intentions using four general strain theory’s (GST) attributes: financial stress, 
work stress, leadership, and peer pressure. This study focuses on petty corruption as [8] argue that it is more 
dangerous than grand corruption because it might be institutionalized and lead to more serious misconduct. 

There are two major contributions to this work. Firstly, it attempts to build on previous work [13], 
[14] by presenting evidence on a machine learning-based police petty corruption intention prediction model in 
a developing and non-western country research setting: Malaysia. This study chooses Malaysia as the research 
setting because [10] stress that police corruption is more widespread and obvious in developing countries than 
in developed countries. Therefore, it is critical to explore the options of using automated prediction of 
corruption based on intelligent software tools. Secondly, this paper presents a new design and execution of 
machine learning prediction on police petty corruption based on GST constructs. 

The rest of the paper is laid out as follows. The second section discusses previous research on 
corruption prediction that used machine learning techniques. The data set for this investigation, as well as the 
machine learning technique are detailed in section 3. The experimental findings for each algorithm are shown 
and discussed in section 4. The summary and conclusions are presented in the concluding section. 


2. LITERATURE REVIEW 

According to a review of the literature on corruption, most previous research relied on correlations 
and regression-based statistical techniques [15], [16]. Nevertheless, recently, the use of artificial intelligence 
and machine learning approaches in identifying and comprehending governmental fraud and corruption has 
gained traction in the literature [13], [14], due to its ability to discover trends and patterns in a wide range of 
data [13]. This paved the way for real-time surveillance and identification of fraud and corruption. For example, 
[17] used three types of data mining approaches to predict corruption perceptions in 132 countries: random 
forest, support vector machine, and artificial neural networks. The data set used by the study is gathered from 
the World Bank, Transparency International, and the Heritage Foundation for the years of 2017 and 2018. The 
results show that random forest, an ensemble-type machine learning method, was the most accurate 
classification model, followed by support vector machines and artificial neural networks, with overall accuracy 
of 85.77%, 76.15%, and 73.84% respectively. 
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Further, study by Rabuzin and ModruSan [18] aims to predict indications of corruption in the public 
procurement process using machine learning methods and text-mining techniques such as Naive Bayes, logistic 
regression, and support vector machines. As a data set of prediction model, the study used content of the tender 
documentation from the croatian procurement portal and grouped them into food, beverages, tobacco and 
related products, medical equipment, pharmaceuticals and personal care products, construction work, repair 
and maintenance services, architecture, construction, engineering and inspection services, health and social 
work services, sewage, refuse, cleaning and environmental services and IT services. The results indicate that 
support vector machines and logistic regression were better in producing predictions linked to health and social 
work services, whereas the naive bayes algorithm performed better in almost all other groupings. 

Meanwhile, using police archives [19], deploy machine learning algorithms to predict corruption 
crimes in Italian municipalities from 2012-2014. Over 70% of municipalities that will have corruption incidents 
have been correctly identified by the study. In line with [14], [17]-[19] aims to develop a robust predictive 
machine learning model on local-government corruption in Brazil. This study used budget accounts data of 
Brazilian municipalities from 2003 to 2010. Employing a gradient boosted classifier that consisted of an 
ensemble of decision trees, the model was able to detect the existence and predict the intensity of corruption 
with accuracy of 76% and an AUC of 0.834. 

This study aims to extend prior works [14]-[17] by utilizing GST attributes to construct machine 
learning prediction models on police petty corruption intentions in Malaysia. Robert Agnew’s GST, a 
criminology theory, has often been used to better understand the phenomena of white-collar crime, including 
corruption [20]. The theory holds that stress and pressure are significant predictors of workplace wrongdoing. 
It’s because strain or stress often lead to unfavorable feelings like anger, frustration, depression, and despair 
which, in turn, create pressures for corrective action, with crime or delinquency being one possible response 
[21]. In police jobs, stress/pressure appears to be a substantial component, thus GST could provide a robust 
explanation for police behavior and intention to commit petty corruption. Following prior research [20], [21], 
this study uses four GST stress/pressure attributes namely financial stress, work stress, peer pressure and 
leadership pressure. 


3. METHOD 
3.1. Sample of data 

The data for this study was gathered from 225 RMP officials using a questionnaire survey. The 
collected data can be used on evaluating different prediction models that use different machine learning 
algorithms. There were two sections of the questionnaires developed for collecting the data. The first section 
consisted of gender, age, race, marital status, education level, work experience, department, and position level 
as proxies for demographic factors. Meanwhile, section 2 featured four GST attributes that contributed to petty 
corruption. To measure each attribute of the theory and petty corruption intention, several indicators have been 
developed and employed based on previous studies [22]—[29]. The questionnaire used a five-point Likert scale, 
allowing respondents to choose whether they strongly disagree, disagree, neutral, agree, or strongly agree with 
each of these indicators. The average values of each construct’s indicator were used to calculate the estimate. 
Table 1 shows the indicators of GST’s attributes and the petty corruption, as well as the source of 
measurements. 


Table 1. Attributes of GST 


Attributes Description References 
Work stress 10 indicators to measure the degree of work stress among police officials. [22], [23] 
Financial stress 8 indicators to measure the degree of financial stress among police officials. [24] 
Leadership pressure _10 indicators to measure the degree of leadership pressure among police officials. [25], [26] 
Peer pressure 10 indicators to measure the degree of peer pressure among police officials. [27], [28] 
Petty corruption 5 indicators to measure the police officials’ intention on petty corruption. [29] 


The total means of petty corruption indicators was used to specify the petty corruption intention among 
the RMP officials. If the total means is equal and above 2.5, they were considered to have the intention (class 
1). Otherwise, the intention is categorized as class 0 or no intention of petty corruption. Therefore, if the 
prediction models gain prediction probability at 0.5 and above, the model will classify the officer as class 1, 
meaning the possibility to have petty corruption. Table 2 lists the distribution of the dating violence (DV) from 
the dataset, which shows that major data consists of class 0 or no intention of petty corruption. Figure 1 depicts 
the examples of prediction from one of the prediction models executed in RapidMiner software. 
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Table 2. Optimal hyper-parameters 


Class Label in RapidMiner Count _ Percentage 
0 (no petty corruption intention) Rangel 181 80.44% 
1 (petty corruption intention) Range2 44 19.56% 
Row No. DV | prediction(DV) | confidence(range2) | confidence(range’1) 
wrest meet 0S wars 


Figure 1. The prediction sample’s 


Take for example the data at row 1, the real case is categorized as range | or class 0 (no petty 
corruption intention) and the model can truly predict the case because the probability of prediction for range 1 
is 0.82 confidence. Otherwise in row 2, the real case is range 2 (class 1), but the confidence for range 2 is 0.64 
that caused the prediction model to set the prediction value as range 1 (class 0). In this case, the model wrongly 
predicts the class. 


3.2. Attributes of the prediction models 

Figure 2 presents the attributes from demography and GST that were selected as the independent 
variables (IVs) in predicting the petty corruption intention DV. Based on the Pearson correlation test, most 
attributes have low to very weak correlation to the DV. Nevertheless, all of the IVs were considered to have 
some degree of information to give knowledge to the machine learning algorithms in making the predictions. 
The diversity of the IVs in the different prediction models with different machine learning algorithms will be 
discussed in the results part. 


Financial Stress a (2 ()/ 
Work Stress == 9) | 33 
Working Experience ™ 0.012 
| 
Gender | 0.002 
Academic Qualification =m 0.095 
0.000 0.100 0.200 0.300 0.400 0.500 0.600 


Attribute 


Corellation Coefficient 


Figure 2. Correlation of each IV to DV outside the prediction model 


3.3. The machine learning algorithms 

Six machine learning algorithms namely generalized linear model, fast large margin, decision tree, 
random forest, gradient boosted trees, and support vector machine were used to be executed in a 16 GB 
computer RAM with RapidMiner software. Based on automodel RapidMiner, the six algorithms were 
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suggested, and the optimal hyper-parameters setting are given in Table 3. Not available (NA) denotes that the 
algorithm is not using the given parameters. 


Table 3. Optimal hyper-parameters 


Algorithm C Number of trees — Maximal depth Learning rate | Gamma ___ Error rate (%) 
Fast large margin 1 NA NA NA NA 20 
Decision tree NA NA 2 NA NA 17.8 
Random forest NA 20 7 NA NA 17.8 
Gradient boosted trees NA 90 7 0.1 NA 14.8 
Support vector machine 1000 NA NA NA 0.005 18.5 


Fast large margin has C as the only hyper-parameter, with the worst error rate it achieved was 22.2% 
if the value of C was 0.01. The optimal when it was | at error rate 20%. The decision tree has maximal depth 
as the only hyper-parameter to be observed. The most optimal maximal depth for decision tree was 2 to get 
17.8% error rate value. Similarly, the lowest error rate from random forest was 17.8% at the maximal depth 7. 
Additionally, random forest has one more hyper-parameter besides maximal depth, namely number of trees. 
The optimal value for random forest number of trees was 20. Besides number of trees and maximal depth, 
gradient boosted trees has a learning rate, which the optimal values were 20, 7, and 0.1 respectively to achieve 
an error rate at 14.8%. Support vector machine has C and gamma with the optimal set for C was 1,000 and for 
gamma was 0.005 to get the best error rate at 18.5%. The setting for the machine learning algorithms can be 
referred to the Table 2 and no parameters involved for generalized linear model. 


3.4. Validation 

Another setting for the machine learning algorithms is the training and testing split technique. This 
research divided the 225 dataset into 161 data for training dataset and the remaining 64 data as hold-out samples 
for testing. As the prediction models are to classify two classes of petty corruption intention, a confusion matrix 
was used to measure the algorithms’ accuracy, recall and precision. Figure 3 presents the confusion matrix for 
the petty corruption classification in the prediction models, which can be described as the following: 
- True Positive (TP) is the number of class 1 (petty corruption intention) can be correctly classified. 
- True Negative (TN) is the number of class 0 (no petty corruption intention) can be correctly classified. 
- False Positive (FP) is the number of class 1 (petty corruption intention) incorrectly classified as 0. 
- False Negative (FN) is the number of classes 0 (no petty corruption intention) incorrectly classified 1. 


Real class 1 Real class 0 Class precision 
Predicted as petty corruption intention | True Positive (TP) False Positive (FP) TP/(FP+TP) 


Predicted as no petty corruption False Negative (FN) | True Negative (TN) TN/(IN+FN) 
intention 


Class Recall TP/( TP + FN) TN/( TN + FP) 


Figure 3. Confusion matrix of the petty corruption classification in the prediction model 


TP is the total correct predictions of class 1 (petty corruption intention) while TN is the false total 
predictions for the class. On the other hand, FP is the total correction predictions of class 0 (no petty corruption 
intention) while FN is the false predictions for the case. Based on the confusion matrix, the accuracy can be 
measured with the example (1). Accuracy is the total correct prediction from the total cases of testing data. 
Accuracy not presenting the powerful of the prediction model to correctly classify each class, hence precision 
and recall can be used. Precision for predicting the intention of petty corruption is complemented with the 
recall for class 1. 


Accuracy = (TP+TN)/(TP+TN+FP+TN) (1) 
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4. RESULTS AND DISCUSSION 

The results were divided into two. Firstly, the results of performances of the machine learning in the 
petty corruption intention prediction models were presented and discussed based on the generated confusion 
matrix. Secondly, the variation of correlation weights from each attribute of the prediction models are 
presented. Table 4 lists the performances of each machine learning algorithm in the prediction models. Real is 
the value taken from the collected data while predicted is the value generated from the prediction models. 


Table 4. Confusion matrix results 
Real class 1 Real classO Class precision _ Accuracy 


Generalized linear model 87.7% 
Predicted as petty corruption intention. 7 2 71.18% 
Predicted as no petty corruption intention. 6 50 89.29% 
Class recall 53.85% 96.15% 
Fast large margin 89.0% 
Predicted as petty corruption intention. 6 1 85.71% 
Predicted as no petty corruption intention. 6 51 89.47% 
Class recall 50.0% 98.08% 
Decision tree 79.7% 
Predicted as petty corruption intention. 0 0 00.0% 
Predicted as no petty corruption intention. 13 51 79.69% 
Class recall 00.0% 100.0% 
Random forest 85.9% 
Predicted as petty corruption intention. 5 2. 71.43% 
Predicted as no petty corruption intention. 7 50 87.72% 
Class recall 41.67% 96.15% 
Gradient boosted trees 90.5% 
Predicted as petty corruption intention. 5 0 100.0% 
Predicted as no petty corruption intention. 6 53 89.83% 
Class recall 45.45% 100.0% 
Support vector machine 89.2% 
Predicted as petty corruption intention. 9 3 75.0% 
Predicted as no petty corruption intention. 4 49 92.45% 
Class recall 69.23% 94.23% 


Generally, the highest accuracy has been presented by the gradient boosted trees algorithm in the 
prediction model. Gradient boosted trees performed the best in predicting the cases of intention for petty 
corruption (100%) although the given training and testing data have very less numbers of this case see in Table 
2. All machine learning algorithms except decision tree have better ability in classifying class 1. In others work, 
the models with these algorithms are powerful to detect petty corruption intentions at precision values above 
70% compared to the recall values (less than 70%). Due to more exposure on class 0 from the dataset, all the 
six machine learning algorithms present outstanding ability in detecting the cases of no intention for petty 
corruption at recall for class 0 above 90%. Furthermore, it will be more interesting to get insight on how the 
variances of each attribute’s correlation to the DV that influenced the machine learning performances. Table 5 
presents the comparison of the correlation weights. 


Table 5. The weights of correlations of each GST and demography attribute 


Attributes Generalized Fast large Decision Random Gradient boosted Support vector 
linear model margin tree forest trees machine 

FIT 

Work stress 0.017 0.010 0.019 0.034 0.127 0.092 
Financial stress 0.030 0.031 0.008 0.049 0.028 0.079 
Leadership pressure 0.006 0.005 0.148 0.021 0.054 0.021 
Peer pressure 0.183 0.177 0.008 0.148 0.135 0.167 
Demography 

Gender 0.092 0.085 0.105 0.048 0.060 0.026 
Race 0.002 0.058 0.004 0.017 0.013 0.071 
Marital status 0.059 0.163 0.011 0.047 0.039 0.065 
Working experiences 0.013 0.006 0.011 0.034 0.032 0.036 
Academic 0.066 0.183 0.008 0.041 0.025 0.054 
qualification 

Officer rank 0.083 0.185 0.014 0.034 0.0066 0.105 
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Table 5 shows that GST has given more impact on the performances of the machine learning 
algorithms in the prediction models compared to demography. Majority of machine learning algorithms have 
gained the highest weight from peer pressure. This is consistent with the results of correlation outside machine 
learning prediction models depicted in Figure |. Besides, the findings also in line with prior research, which 
found a significant relationship between peer exposure and employee’s misconducts [30], as peers can be 
served as a model that influences the behaviors and attitudes of others in the group [31]. In the decision tree, 
leadership pressure has become the most important. Nevertheless, in fast large margin, academic qualification 
is the biggest influencer attribute. 


5. CONCLUSION 

This paper presents significant findings of research concerned with work ethics and integrity of 
professional employees. Focused on petty corruption intentions among the officers of law enforcement, this 
paper presents the design and empirical findings of machine learning predictive algorithms. Pressure given by 
the peers found to be the most influential factor of petty corruption among the officers from the tested dataset 
hence became the main contributor in all the machine learning algorithms. The analysis presented in this paper 
will convey valuable information for future research that can be explored on advanced machine learning 
predictive algorithms to detect corruption of law enforcement officials. 
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